Power Tools 1993 November

home *** CD-ROM | disk | FTP | other *** search

/ Power Tools 1993 November - Disc 2 / Power Tools Plus (Disc 2 of 2)(November 1993)(HP).iso / hotlines / gsyhl / hawhite / whitep.txt < prev

Wrap

Text File | 1993-05-12 | 60.8 KB | 1,208 lines

HP'S HIGH AVAILABILITY SOLUTIONS TABLE OF CONTENTS INTRODUCTION ..............................................3 HIGH AVAILABILITY REQUIREMENTS ............................4 WHAT CAUSES DOWNTIME? .....................................5 FAULT TOLERANT vs HIGHLY AVAILABLE SYSTEMS.................6 HIGH AVAILABLE SOLUTIONS ..................................6 RELIABLE SYSTEMS .....................................8 DATA AVAILABILITY ...................................11 DISK ARRAYS ....................................11 HP DATAPAIR/800 ................................12 SYSTEM AVAILABILITY ..................................15 SWITCHOVER/UX ..................................15 HIGH AVAILABILITY DIRECTIONS .............................18 FAULT TOLERANT SOLUTIONS..................................21 APPENDIX A SWITCHOVER/UX OPERATION .............................28 APPENDIX B SWITCHOVER/UX QUESTIONS AND ANSWERS .................34 HIGH AVAILABILITY SOLUTIONS INTRODUCTION The demand for greater system availability expands as businesses base their operation and competitiveness on mission critical applications - applications that are integral in the day to day operation of an organization. Because of market pressures and user expectations, the need for higher availability is greater than ever before. Services such as the following depend on continuous operation of computing resources: * On-line order entry, funds transfer, message switching and realtime process monitoring. * Operational services such as systems development capabilities and departmental services such as billing and personnel. * Administrative support applications such as electronic mail, decision support and word processing. The loss of computing capabilities can impact production, revenue, critical decision making, customer satisfaction and even human life. The Gartner Group summarizes the evolution occurring in the the 1990s as, "Technology is approaching 'no-compromise' 7x24x52 (7 days per week, 24 hours per day, 52 weeks per year) availability on a spectrum with substantial granularity. Users now can match required service levels to the 'availability investment' of a proposed solution." System downtime can incur the following costs: * Incomplete or late work resulting in penalties or loss of customer confidence and satisfaction * Idle staff, because the system failure isolates employees from resources necessary to complete their tasks * Underutilized capacity and loss of business opportunity * Loss of human life As the cost associated with downtime increases, greater levels of availability are financially justified. Hewlett Packard provides a broad range of availability solutions that includes reliable components, redundant components, recoverable systems and fault tolerance. This white paper discusses how Hewlett Packard is addressing these three aspects of availability by providing solutions that prevent and eliminate failures that cause downtime: hardware, software, operations, and the computing environment. HIGH AVAILABILITY REQUIREMENTS When planning a computing environment, there are several crucial elements which are typically considered. Among the major elements are performance, scalability, capacity, functionality, interoperability and availability. Market pressures and user expectations have forced system designers and implementers to consider availability as an integral part of the system solution. Organizations must include higher availability solutions into their environments in a way which fits the goals for compatibility, connectivity, integration and cost-effectiveness. It is possible to measure the likely cost of downtime and determine the configuration or solution which addresses the level of availability required. There are several key requirements which today's computing environments must incorporate to address the availability needs for an organization's business. The three most crucial are: 1) data availability, integrity and recovery 2) minimal or no planned downtime 3) minimal unplanned downtime How a system performs each of these metrics determines how well the system can meet the defined needs. The first requirement, data availability and integrity, is the most critical since it refers to the accuracy, correctness, consistency and validity of the data. In environments where computer systems are constantly being relied upon to provide information for making critical business decisions, data can never be corrupted or permanently lost. After the data is guaranteed intact, the second and third requirements together determine the system's availability. Planned downtime is the time a system is unavailable for predetermined periods to perform tasks like system maintenance, software or hardware upgrades and disk backups. Unplanned downtime is the time a system is unavailable due to unanticipated events like hardware failures, environmental errors and operator errors. In general, the computing resources must be designed so that neither planned nor unplanned downtime will negatively impact essential business operations. WHAT CAUSES DOWNTIME? In developing a strategy to address the basic system requirements, it is useful to understand why systems fail. The figure below illustrates the results of a Gartner Group study on downtime for both conventional and fault tolerant systems. As shown in the figure, downtime can be divided into four categories: hardware, software, people and environment. Hardware - Processors - Peripherals such as disk drives, printers, etc - Memory Software - Operating System - Layered software products such as transaction processing monitors and database management systems - Application programs People - Management and operations personnel - Support engineers Environment - Electrical power - Catastropic such as fire, earthquake, flooding Because hardware is only one component which contributes to system failures, it is generally not enough to address the issue of system availability with just fault tolerant processors. Systems must be designed to survive a broad spectrum of failures ranging from software and hardware faults to people and environmental problems. This paper details HP's current High Availability and Fault Tolerant product offering and defines the strategic direction. Fault Tolerant vs Highly Available Systems Fault tolerant systems are installed in environments that require no interruption of user service and absolute data integrity. The architecture of fault tolerant systems includes fully redundant components in a single package. The functionality handles hardware failures and also insures application programs cannot be corrupted by errors resulting from any single point hardware failure. When a failure occurs, the recovery is virtually transparent to applications and users, with minimal performance degradation. Components of the system, such as CPU boards or memory can be added or replaced on line, therefore minimizing the need for planned downtime. Highly available systems are sometimes referred to as fault-resilient systems. They are capable of delivering some of the features (such as integrity of data on disks) and functions of fault tolerance, however some system downtime will be associated with a system failure. Highly available systems are based on reliable computers, configured to significantly increase the availability of the system. They quickly recover from both software and hardware faults within minutes of detecting a problem. They are typically a loosely coupled configuration. Highly available solutions which incorporate multiple or redundant components are able to efficiently utilize the available resources under normal operating conditions. HIGH AVAILABILITY SOLUTIONS The increasing reliance of computing in all phases of business is driving the need for increased data availability and integrity, minimization of unplanned downtime and elimination of planned downtime. As shown in the following figure, five tiers of solutions exist today to address system availability issues. The first of these is to provide extremely reliable systems, systems which provide a solid basis for operations. Data integrity and increased data availability can be obtained through the use of disk arrays operating in high availability mode. To further address data availability, the third tier provides disk mirroring functionality through HP Datapair/800. The fourth tier provides nearly continuous system processing with Switchover/UX, designed to have a standby system take over for a failed mission-critical primary. The final tier steps above high availability to the realm of fault tolerance where the Series 1200 fault tolerant systems provide absolute data integrity and continuous system processing. The next several sections will discuss these tiers of availability solutions. RELIABLE SYSTEMS HP's strategy for addressing the growing need for increased system availability is to provide flexible, highly available systems. The foundation for the Series 800 Business Servers is its processors and peripherals which lead the industry in reliability. HP's PA-RISC architecture and advanced VLSI technology dramatically increases reliability by reducing the number of parts that can fail. Actual field data shows that the MTBF (Mean Time Between Failure) for HP systems exceeds three years and the Series 8x7 systems have a projected MTBF of 4.8 years. System uptime typically exceeds 99.97 percent. Other vendors typically provide 99.5 percent uptime for conventional systems. Although the .47 percent may appear insignificant, this equates to an additional 41.2 hours per year downtime over the HP Series 800 systems. Our disks also provide a very high MTBF due to HP's outstanding disk technology. HP manufactures its own drive mechanisms for use in both HP computers as well as the computers of other large companies. The quality of drives is now measured in decades between failures; typical mass storage subsystem can have a MTBF of 100,000 hours, or over 11 years of continuous operation. This is a complete subsystem that includes power supply, controller board, etc. (not just the disk itself). HP will provide Support Watch, a proactive approach to hardware support. HP Support Watch detects and reports transient hardware faults before they result in unplanned downtime. If a problem is found, the software alerts the HP Response Center before a failure occurs. HP quality extends to software as well. HP performs extensive testing of all major releases of the HP-UX operating system. The release criteria includes certification for functionality, usability, reliability, performance and supportability. The software reliability testing is measured in Continuous Hours of Operation and is accomplished through extensive feature and path flow coverage. In addition, recovery capabilities are built into the leading database management systems with which HP has premier relationships. To address unforeseen circumstances such as power outages, the Series 800 Business Servers support power-fail auto restart. Power-fail auto restart preserves what is in main memory in the event of a power interruption so that, upon power restoration, normal operations can resume. This eliminates any data loss (for up to 15 minutes). HP offers and Uninterruptible Power Supply (UPS), from a third party manufacturer, Deltec. UPS provides a constant flow of refined, regulated, computer-grade power through virtually any utility line disturbance. The Deltec UPS protects hardware and data from blackouts, brownouts, surges and spikes. Since operator error accounts for a significant amount of downtime, HP is taking steps to minimize operator intervention. For example, the Series 800 Business Servers are shipped as an integrated package which includes the interface cards, operating system and networking software pre-installed. Another feature is the autoconfiguration of the available devices into the system upon boot-up of the system. This reduces the operator involvement in configuring devices during the system startup. To effectively manage the disk utilization, the HP-UX operating system includes the capability to set disk space limits (disk quotas). This provides a mechanism for controlling and reporting user utilization of disk space. Other software solutions such as HP OmniBack and HP OmniBack/Turbo provide unattended network backup, thereby eliminating any need for operator intervention and possible operator error. In summary, HP's reliable systems provide a solid foundation from which higher availability solutions can be added as required. This foundation includes PA-RISC architecture, systems and peripherals with high MTBF, HP Support Watch, high quality HP-UX operating system software and Power-fail auto restart. FUTURE STRATEGY HP will continue to make the Series 800 systems more highly available. Several of the strategies which will be used to accomplish this include online replacement of I/O cards, which includes disk/tape controllers, data communication links, and multiplexer cards. Online CPU and memory replacement is also being investigated for the multiprocessing systems. Also investigatations are underway to provide a more fault resilient operating system to isolate application failures and alleviate system panics. Additional availability functionality will be provided by incorporating and adding value to emerging standards. To minimize operator errors, the utilization of easy-to-use, intuitive graphical user interfaces will play a major role in the evolving application development. In addition, utilization of fault isolation and event notification will be incorporated within the evolving systems management solutions to further automate overall operations. Increased availability can be gained through greater software and application quality. This is especially critical when independently-developed applications are brought together for the first time on a mission-critical system. To address the software development cycle, HP continues to provide high level languages, change management tools, and Computer Aided Software Environment (CASE) solutions. In addition, HP is investigating online debuggers and editors to facilitate the loading of an application on a running system. DATA AVAILABILITY The second tier for providing higher availability is data availability. Since data is one of the most important assets of a corporation, maintaining full access and integrity of the data is critical to operations. There are two approaches to increasing data availability: disk arrays and disk mirroring. DISK ARRAYS The primary functions of disk arrays are to increase data availability, to increase total storage capacity, and to provide performance flexibility by selectively spreading data over multiple disk drives. HP's disk arrays protect data and allow uninterrupted access to data in the event of a disk drive failure. The disk array in high availability mode utilizes a separate data protection disk to store an encoded form of data from the other disks in the array. In the event of a disk drive failure, the encoded disk allows continued access to the data with no loss of system performance. The failed disk may later be replaced on-line and the encoded data will be reconstructed onto the new disk. HP's high availability disk arrays offer a redundant array of inexpensive disks (RAID) level 3 mode of operation which offers the best solution to meet HP OLTP multiuser computer system requirements. In high availability mode, the disk array appears to the host system as a single 2.7 or 5.4 gigabyte disk. All the individual mechanisms work in unison on every request from the host. The controller queues requests and executes them sequentially. Since all the mechanisms work together in parallel, the transfer rate of the array can be as high as four time the transfer rate of a single disk. This means that the high availability mode will be most efficient at processing long, large transfers, or on systems with low I/O (transaction rate) demand on disk subsystems. The protection in the disk array is provided at a lower cost than fully redundant disks used in disk mirroring solutions since only 25% more disk space is needed to store the encoded data versus up to 100% more disk space need to mirror data. However, disk mirroring solution do offer more data protection in that all cables, interfaces, power supplies and controllers are duplicated. HP DATAPAIR/800 HP DataPair/800 prevents data loss by maintaining two copies of data on separate disks so that the data is still intact after any single disk or interface card failure. DataPair/800 can mirror any disk partitions including the root and swap partitions. It supports mirroring of raw disk access as well as access through the file system. Mirroring performance depends upon the mix of disk reads and writes. The mirrored write speed is slightly slower than unmirrored writes since writes to both disks must be completed before the mirrored write is complete. Reads are done from the least busy disk and are optimized to the point that the driver can achieve more than 100% I/O accesses per second on read applications. HP DataPair/800 works transparently to the application. The disk mirroring software works with the HP-UX kernel to manage the mirrored disks, so no application modification is required. HP DataPair/800 allows you to take a disk in a mirrored pair off-line to perform a backup, while applications continue to access data from the on-line disk. As the backup is being performed, the changes which are being made to the on-line disk are maintained in table memory. When you bring the disk back on-line after the backup procedure, a fast update is done to return the mirror to a synchronized state. Application continuity is maintained during the synchronization state. You can create or delete a mirror at any time without suspending your application thus allowing dynamic configuration. HP DataPair allows you to mirror the data which is critical to the user or the application. Partitions of the disk or the entire disk can be mirrored. The operation of HP DataPair/800 can be done either using the menu-driven interface of the HP-UX System Administration Management (SAM) tool or through HP-UX shell commands. HP DataPair/800 works with HP Fiber Link (HP-FL) technology. Sets of disks are connected to the system using HP-FL interface cards. This prevents the I/O interface card from becoming the single point of failure. FUTURE STRATEGY To address the need for increased data availability, HP will continue to provide additional modes of disk array technology. There are several modes of RAID technology including a mode to provide mirroring capability. HP plans to incorporate Open Systems Foundation (OSF) technology into the HP-UX operating environment. One such technology is the Logical Volume Manager (LVM). The LVM provides two primary pieces of functionality, disk management flexibility and mirroring capability. In the area of disk management flexibility, the LVM allows user-defined disk partitions, concatenation of partitions into volumes, file systems and raw partitions to span multiple physical disks, the ability to grow volumes online and dynamically detect, relocate and repair bad disk sectors. In the area of high availability, the LVM provides triple mirroring of volumes across SCSI and/or Fiber Link interfaces. Triple mirroring allows mirroring, even during an online backup. HP plans to offer the LVM in 1992. SYSTEM AVAILABILITY The next tier to providing higher availability is addressing system availability. Data availability solutions do not respond to system interruptions which result from either system hardware or software failures. In the event of a failure at the system level, the response should be a automatic and quickly return operations back to normal conditions. SWITCHOVER/UX SwitchOver/UX provides near continuous operation of your mission critical computing environment. It is a combination of software functionality and hardware interconnection that enables your computer system, and applications which run on it, to recover promptly from failure of a system processing unit (SPU). Using SwitchOver/UX, you can build highly-available groups of hosts that provide high system availability. In the event of a processor failure, a standby processor can automatically begin providing the services of the failed processor. The standby processor is connected to the same disks and LAN as the primaries so it will become an identical replacement for the failed primary. SwitchOver/UX is ideally suited for environments with the following characteristics: 1) Require near-continuous operation. 2) Have a mix of critical and non-critical applications. 3) Multiple systems networked together. 4) Transaction based applications. 5) Utilize industry standard databases. 6) Can not justify the added cost of fault tolerance. While most target environments will have these characteristics, SwitchOver/UX is flexible enough to be employed in a wide range of applications. SwitchOver/UX provides fast recovery without operator intervention. Using a "heartbeat" message to communicate the state of the system, the primary or primaries allow the standby to automatically detect a system failure of a primary and automatically reboot the standby to become an identical replacement for the failed primary. Systems configured with SwitchOver/UX do not require an operator to be present in order to detect or respond to a failure. Since there is no need for operator intervention, the recovery time is significantly decreased. By having multiple data paths through different processors, LANs and I/O channels along with disk mirroring, there is no single point of failure in a loosely-coupled processor configuration. The individual failure of a component will not bring the system down for an extended period of time. The recovery time for a system is dependent on the system size, configuration and application. The recovery process includes fault detection, system recovery and application recovery. (Refer to appendix A for details on the recovery process.) The standby system is active and can be used to provide non-critical services. For example, while the primaries are running critical services in a production environment, the standby could be used as a development system. When the primary fails, the processes on the standby can be gracefully shutdown before the standby begins its recovery process. Once the primary is repaired it can be used as a standby to continue development work. By being able to utilize the standby for services which can be interrupted for a short period of time in the event of an processor problem, SwitchOver/UX maximizes your computing investment. Most database applications are capable of recovering from reboots. Log files in databases keep track of all transactions that the database has accepted and committed to disk. When a system is forced to reboot, either due to a system fault, power failure or other incident, the database will lose the current transaction (assuming it has not been completed), but not the committed transactions. As a result, end users only need to check on the last transaction they were entering. They do not need to check on committed transactions because databases are designed to work with the reboot mechanism SwitchOver/UX employs. After the standby takes over for the failed primary, users need to log back into the system. Users do not need to know that they are logging into a different system. They can use the same procedures they followed to login the first time (i.e. they do not need to use a different machine name to log in). FUTURE STRATEGY In mid 1992, SwitchOver/UX will provide interoperabilty with the Logical Volume Manager, support an FDDI network and operation with both SCSI and HP-FL disks. To provide a quicker recovery period, a journaled file system is being investigated to speed up the system reboot. A journaled file system keeps a copy of every disk operation that has occurred until the contents of memory have been written to disk. In the event a system fails, only the changes made to the disk since the last update are checked and reconstructed as necessary. This greatly reduces the file system check phase and allow quick file system restarts. HIGH AVAILABILITY DIRECTIONS DISTRIBUTED AVAILABILITY HP recognizes the trend toward distributed data processing in the 1990's. Consequently, the horizon of availability solutions must expand to meet these needs. Based on trends, distributed availability solutions will begin to play a significant role in the 1994-1995 timeframe due to the implementation of Client/Server technology and cooperative processing. In addition, as business operations are spanning the world the need for system and data access must be independent of the distance and location. Distributed availability solutions consider a network of systems, whether they are across a site or around the world, as a single environment. The primary goal of distributed availability is to provide transparent data access and transaction routing in the event that any one node, communication link, or site fails. Similar to system-based high availability requirements, distributed availability must be based on reliable systems and data communications. Of particular importance is the networking of systems and the major reliance on data communications. Through technology defined by the Open Systems Foundation (OSF), HP plans to incorporate the distributed networking capabilities. HP currently provides Network Computing Service (NCS) which is part of the OSF Distributed Computing Environment (DCE). NCS facilitates the interconnectivity of systems within a heterogeneous networked environment. HP also will be providing FDDI capabilities as well as redundant LAN functionality to accommodate network robustness. Using redundant networking, a failure of a communications link does not impact data flow since the networking traffic proceeds through an alternate network link. OSF has defined a distributed naming service to facilitate the communication of information throughout a networked environment. This will allow users to identify, by name, resources such as servers, files, disks, or print queues and gain access to them without needing to know where they are located in the network. This will work in LAN as well as WAN environments. Part of the distributed naming service is the Andrews Distributed File System from Transarc. This provides data accessibility across geographic sites while retaining a consistent and uniform naming convention without compromising on performance. The systems which encompass the distributed environment appear as one worldwide file system. HP plans to take advantage of the Andrews Distributed File System in future releases of the HP-UX operating system. The distributed file system also will be designed to enable administrators to do file system backups while the system is up and running. The plan is to create a replicated copy of the file system, then backup the copy. This will allow user to continue changing the original data while performing required system administration. Maintaining data integrity is critical within a distributed environment. The utilization of transaction processing technology will provide transaction commitment and concurrency. Using protocols such as two phase commit, database vendors will maintain data integrity through the coordination of committed transactions to the participating distributed systems. Two-phase commit protocol provides the capability to either commit or abort a transaction that spans multiple systems. In the event one of the participating systems in the distributed environment fails during the course of a transaction commitment, the total commit will be aborted. This provides transaction integrity across multiple systems. After a failure, recovery can begin by identifying, from the transaction monitor log, the uncommitted transactions. HP plans to take advantage of transaction technology from Transarc to assist in the coordination and commitment of multiple-site transactions. SUMMARY In order to meet customers' increasing availability needs, HP will continue its high availability and fault tolerant program through the following directions: continued enhancement of existing availability and fault tolerant products, delivery of additional availability functionality to "round out" the availability offering, and movement toward providing distributed availability solutions to meet the needs of the emerging distributed computing trend. When possible, these directions will leverage emerging standards either directly or by adding value upon them. The HP9000 systems offer a broad range of high availability and fault tolerant solutions today to minimize or eliminate computing interruption and financial impact due to loss of data or application availability. HP plans to continue to improve these solutions with additional availability functionality, and meet the requirements of distributed computing by providing distributed availability solutions in the future. FAULT TOLERANT SOLUTIONS The Series 1200 family is designed for continuous operations, and high performance OLTP applications. The family, is based on Motorola's 32 bit high performance 68K microprocessors. The two systems, Model 1240 and Model 1245, run the same Operating System, HP-FX, and are completely object code compatible. The Model 1240, first introduced in March 1990, is based on Motorola's 68030 microprocessors and supports 256Kbyte of Cache. The Model 1245, introduced in August 1991, is a board upgrade to the Model 1240. Based on Motorola's fastest microprocessor MC68040 (rated at 20 MIPS), the Model 1245 provides up to three times the performance of HP's low-end Fault Tolerant system. To achieve higher performance rates, the Model 1245 supports a larger Cache size, which significantly minimizes the traffic across the system backplane. All models have a tightly coupled architecture with processors sharing global memory. This architecture was designed to support symmetric multiprocessing where system performance scales linearly as hardware resources are added. The systems are modular and expandable. Upgrades and reconfiguration can be done on-line. HP's FT computers are based on open systems, and conform to major industry standards. Each one of the primary system resources: Processors, Memory, or IO Elements can be added to the system on-line, with no interruption of the application. Up to 32 Processor Elements can be added to the system, providing up to 600 MIPS. Memory Elements are designed to support high-throughput applications, and can support up to 2GB of shadowed memory (4GB of physical memory). The system supports mainframe type applications, which require large disk farms, high-speed communications interfaces, and support for multiple LANs. The system also supports large number of terminal-based users. Designed from inception to be a Fault Tolerant platform, the 1200 system architecture ensures applications and data will always be available. All critical components are duplicated, to prevent single points of failure. The system identifies faults within one machine cycle to prevent any data corruption. The failed component is immediately isolated (in case of a permanent failure), and the system transparently returns to normal operation. HP-FX, 1200's Symmetric Multi Processing Operating System, has been designed from the ground up to eliminate malfunctions typical to other UNIX implementations. The 1200 systems provides an optimum trade off between hardware and software to assure absolute data integrity. Within one clock cycle the system detects any faults using embedded hardware logic. The failed component is then isolated and the system is logically re-configured using software techniques. This combined hardware/software approach is an efficient way of handling a complex problem that occurs seldom but requires immediate action. Error detection and correction codes are employed in the main memory, IO and processor cache and in the data buffers on the system bus, to ensure the integrity of data throughout the system. Protocol monitors are employed by each element involved in a bus transfer to ensure data integrity on the bus. In addition, timers on the processor boards monitor the elapsed time between various system events, and generate an alarm if the interval exceeds normal limits. Self-checking circuitry checks all outputs. For example, each processor element consists of two processor chips with comparator logic onboard. All instructions are executed to both chips, with the results of each compared for consistency. IO elements also have duplicate processors that self-check each other. Even the error detection H/W is checked to ensure it properly functions (who guards the guard ?). One of the key differentiators of Models 1240/45 is the ability to deal with both permanent and transient errors. Other systems often treat transient errors as software errors when in fact they usually are caused by thresholding hardware. The capability to deal with transient errors allows the resources of Models 1240/45 to be used much longer than if they were quickly shut down at the first trace of a fault. After the hardware detects and isolates an error, HP-FX performs the "intelligent" recovery operations. Recovery operations determine if error is permanent or transient. If permanent, or if the component has exceeded its pre-established threshold, the system sends the process to the next available processor. The system relies on duplicate copies of each processes' state in main memory. When a failure occurs, the system can use this state to restart the process from the point of the last cache flush. If it is a transient error, the module is placed back in service. The system can withstand multiple processor and memory module failures. If other components of the system fail, the system will run, but it cannot withstand another failure in that area. Errors are logged to disks and reported to the system administrator. A repair request will also be automatically transmitted to the HP response Center. Fault Tolerance is transparent to application developers. HP-FX has been designed from the ground-up to prevent system failures due to malfunction in the system S/W. Other Fault Tolerant UNIX systems, based on AT&T's UNIX Kernel, do not address all the aspects of S/W reliability. HP-FX has been redesigned to improve on existing UNIX technology. From its inception, UNIX was not built as a fault tolerant operating System, nor has it been architected to support Symmetric Multi Processing. Therefore, HP-FX's Kernel was redesigned to enable fault tolerant features needed for OLTP computing environments. At any time, the system may experience a hardware fault in a PE, ME, IOE, system bus, controller, or peripheral device. Whenever a fault occurs, it must be possible to recover the process that experienced the fault without losing its process state or losing or duplicating I/O operations. To ensure this, the kernel guarantees that the state of every process in main memory is always internally consistent; that the state of kernel data structures is consistent with the main memory process state; and that no I/O operations are lost or repeated. Structured coding practices were put in place to help improve kernel source readability, and thus expedite bus isolation and system maintenance. These practices are enforced through software tools, used to guarantee consistency of HP-FX code. Hardware dependent code has been separated and is clearly isolated from generic hardware independent code, thus increasing ease of defect isolation and repair. Additionally, new software modifications may be more easily inserted. New hardware technologies require re-work of only the hardware dependent code. Models 1240/45 go beyond hardware fault tolerance by providing features to minimize the chance of the system failing due to software errors. The primary feature is the run-time assertions. Originally added to aid debugging, the HP-FX run-time assertions detect inconsistent process states. Should a comparison with an assertion reveal an uncertainty, the OS will retry the last operation. APPENDIX A SwitchOver/UX Product Operation SwitchOver/UX consists of a set of programs running on the primary and the standby hosts. (A host is the collection of programs and data used by a processor.) There are two "daemon" programs which are used to monitor the state of the primary hosts. (A daemon program is one which runs automatically in the background to provide certain system services.) A primary host in a highly available group runs a daemon called heartbeat. This program periodiclly transmits a message to the standby host over the local area network. The standby host runs a daemon called readpulse, which "listens" for the heartbeat messages. There are five phases to SwitchOver/UX's operation: 1) Normal health checking 2) Fault detection and recovery 3) Application recovery 4) Resume processing 5) Processor Repair Normal Health Checking During normal operation, each primary in a loosely-coupled processor group sends out a "heartbeat" across the LAN to the standby system informing it that everything is functioning properly (Figure 1). The standby system "listens" for these heartbeats and as long as it receives these heartbeats, it will continue to run its own applications. FIGURE 1 Fault Detection and Recovery When the standby misses a "heartbeat" from a primary processor, it assumes the primary has failed and begins the recovery process (Figure 2). First, the standby processor locks the root disk of the primary processor. This prevents the primary processor from inadvertently accessing the disks and possibly corrupting data once the standby has assumed the responsibilities of the failed primary. Next, the standby reboots itself using the root of the failed system. When the standby finishes rebooting, it will also have the network address of the failed system. NOTE: Once the standby initiates the recovery process, it can not be reversed until the processor recovery is completed. Also, any disks the standby was accessing prior to recovery will not be accessible until the failed primary is repaired. FIGURE 2 Application Recovery After the standby system has finished assuming the identity of the failed system, the application needs to go through their own recovery routines. Most databases support recovery from a reboot and can automatically bring themselves to a consistent state. Custom applications need to be structured to recover from a reboot. Applications can be automatically restarted through the use of HP-UX's standard initialization and start-up files. Resume Processing Users log back into the system using their normal procedure, and re-start their individual applications. Users do not need to know that they are running on a different processor (Figure 3). For most database applications, users may need to check on their last transaction before processing was interrupted. Typically, the current transaction will be lost when processing is interrupted. The amount of data loss will depend upon the individual application and database. Batch applications which are interrupted will also need to be re-started. Depending upon the recovery capabilities of the application, either it will need to be re-started from the beginning or from a point where the state of the program and data are consistent. FIGURE 3 SPU Repair After the standby has assumed the responsibilities of the failed primary, it becomes the primary and the failed primary is treated as a standby system. The failed system is then repaired using HP's normal repair processes with the assistance of a system administrator or repair technician. Once the system is repaired, it can be brought up as a standby for the other primaries or it can resume being a primary (Figure 4). In order to return the failed primary to being a primary again, the standby which took over for the primary will need to be shutdown so the primary can regain its disks and network address. This switchover should be scheduled when the system is lightly loaded and users can afford the brief downtime. FIGURE 4 Recovery Time The recovery time for a system is very application dependent. Figure 5 diagrams the recovery process and shows which stages are time dependent upon customer applications. There are three key components to recovery time: 1) Fault detection 2) System recovery 3) Application recovery FIGURE 5 Fault Detection Fault detection is determined by the frequency of the heartbeats and how many heartbeats the standby will allow to go by before it initiates recovery procedures. These two parameters are definable by the system administrator. Typically this stage of the recovery process will take less than one minute. System Recovery During system recovery the standby processor reboots and checks all disk file systems to correct any problems that may have been created when the primary processor failed. The reboot time is dependent upon the class of machine (i.e. 827, 832, etc.) and the amount of RAM memory. The disk checking time depends on how much disk space is used for the file system and the state these files were in at the time of the failure. This component may be the largest part of the recovery time. It may be significantly reduced by using databases and applications which access the disks directly and bypass the HP-UX file system. The most recent versions of industry leading database products typically bypass the file system (using raw disk) which helps to minimize recovery times. Application Recovery Once the processor has recovered and the file system is intact, the application needs to perform its own recovery. For databases this would mean rolling transactions back to a known state and then rolling them forward, completing all committed transactions and discarding all incomplete transactions. The time it takes for this stage is entirely dependent upon the application and the transaction rate prior to the failure. This portion of the recovery time can be minimized by choosing databases and structuring applications to recover quickly from system reboots. SwitchOver/UX Configuration with DataPair/800 Disk Mirroring (DataPair/800, Product Number 92625A) is not required for a SwitchOver/UX system, although to maximize your data availability and integrity in a highly available environment, it is recommended. SwitchOver/UX and HP DataPair/800 require Fiber-Link ("FL") disks. Any Fiber-Link Interface disks may be used, though only disks with the same product numbers may be mirrored. Up to eight disks can be connected in one P-bus chain. The P-bus cable chains the disks together, providing the data path for the standby processor to access the files on the primary processor's disk(s) if a primary fails. Configurations SwitchOver/UX supports configurations of one processor acting as a standby for up to seven primary processors. SwitchOver/UX operates on systems within the same HP 9000 Series 800 system categories. Each loosely-coupled processor group must have processors which can boot off the same root and have identical I/O configurations. There are five system categories. Systems within the same category can be used together to form loosely-coupled processor groups. These categories are: (1) (2) (3) (4) (5) 827S 822S 825S 845S 850S 847S 832S 835S 855S 857S 842S 860S 867S 852S 865S 877S 870/100 870/200 870/300 870/400 Note: SwitchOver/UX is not supported on the 807S, 817S, 837S, 808S and 815S Systems. APPENDIX B SWITCHOVER/UX QUESTIONS AND ANSWERS General Definitions SPU - the box containing the processor Host - the disks and the set of programs run by the SPU Asymmetric configuration - the primary SPUs are not necessarily connected to any disks other than their own. The standby SPU is connected to all disks. Symmetic configuration - each SPU is connected to all disks. Primary Host - Composed of programs and data that are indispensable. Standby Host - Composed of tasks that may be interrupted at any moment without causing serious difficulty. SwitchOver/UX Questions: Q: How similar must the primary and standby hosts be? For example, must they have the same disks, I/O cards,etc? A: There are two types of SwitchOver/UX configurations: symmetrical and asymmetrical. In the symmetrical configuration all SPUs have access to all discs. Anyone of the SPUs can be the designated standby system. Inherent in this configuration, the card layout must be the same. There can be varying number of disks associated with the various SPUs. In the asymmetrical configuration, there is a designated standby system which has access to all disks in the configuration. The I/O configuration of the standby SPU must be a superset of all the primary SPUs it is backing up. Q: What happens when the LAN fails? A: There are two scenarios based on the customer configuration. Scenario 1: Customer has one LAN connection. In this case, the heartbeat daemon running on the primary will no longer be able to transmit its message across the LAN. Similarily, the readpulse daemon running on the standby will no longer hear messages from the primary. The standby will assume the primary has failed and begin the takeover process. Scenario 2: Customer has two LAN networks. When there are two LANs within the configuration, the heartbeat messages from the primary are passed over BOTH LAN networks at all times. The standby is listening for heartbeat messages from either LAN. If one of the LANs fail, the heartbeat message will continue to be transmitted over the failed LAN network. The systems will both continue to operate without interruption. Note, the peripheral devices (DTCs, workstations, etc.) which are connected to the failed network may need to be switched. Q: After a failover how do you restore your switchover configuration? A: The process differs based on the configuration. Symmetrical configuration: Boot the standby host on the repaired primary spu. This results in the repaired primary spu becomming the standby spu. Asymmetrical configuration: You will need to shutdown the SPU running the primary host (prior standby) and reboot the repaired primary spu to now run the primary host. Then, boot the standby spu to run the standby host. Since only the standby spu has full access to all disks within the configuration, this is your only designated standby spu. Q: What are the customer deliverables with SwitchOver/UX? A: Customers will receive one copy of SwitchOver/UX software, two unique link level addresses and the Managing SwitchOver/UX Manual per SwitchOver/UX order (P/N 92668A). The two unique link level addresses are used to reset the hardware link level address associated with the LAN card. It is necessary for SwitchOver/UX to operate with a software link level address versus a hardware link level address. Since the standby SPU takes over the disks of the failed primary, the link level must be associated with the host versus the spu. Two addresses are supplied to accomodate configurations with two LAN networks. Q: Can you share disks within the SwitchOver/UX configuration? A: No, you cannot share data disks within your configuration. The only disk which can be shared in the SwitchOver/UX configuration is the dump disk. This disk can only be used as the dump disk; there can be no data residing on this disk when in shared mode. Q: Do I need the same user license level on the primaries and the standby? A. You do not need to have identical user license levels on all systems within the configuration. In the event of a switchover, the standby system will assume the disks of the failed primary and therefore the user license will be that of the failed primary host. If users who had been operating on the standby system want access to a system, they will need to logon to one of the operational primary systems (based on user license availability). Q: After the switchover has occurred, is the standby host available? A: No, the disks and programs previously running on the standby host are not available until the configuration is restored. Q: Are their restrictions on the systems within a SwitchOver/UX Configuration? A: Yes, there are five unique categories of systems for the SwitchOver/UX configuration: (1) (2) (3) (4) (5) 827A 822S 825S 845S 850S 847S 832S 835S 855S 857S 842S 860S 867S 852S 865S 877S 870/100 870/200 870/300 870/400 All systems within the SwitchOver/UX configuration must be within the same category. For example a configuration may be an 832S and 842S. It cannot include an 845S, 850S; it can only consist of 822S, 832S, 842S and 852S systems. Q: How do I switch printers/tapes from one to the other? A: With the DTC Device Access/ARPA software, printers can be shared among the systems within the configuration. This allows continued access of printers and terminals even after a switchover. HP-IB Tape drives can be manually switched using the 93550A with an HP-IB switch module. Contact the Sales Response Center regarding this product. SCSI tape drives cannot be switched. Q: How does SwitchOver/UX work with PLC's, SCADA systems, other data gathering devices? A. The integrity of data with SwitchOver/UX is up to the last committed transaction. If the PLC, SCADA or other data gathering device is in the process of transmitting data to the S800 system when a failure occurs, that data which has not been committed will be lost. The data gathering devices often have limited memory and therefore the transmission of data to the S800 may be frequent. There could be a user written check on the data gathering device to determine if the data has been received by the S800; this could have an impact on the performance. Q: How far apart can the P-bused disks be? How far apart can SwitchOver/UX systems be? A: 1000m (500 from one SPU to disks in center; 500 meters from disks to other SPU) Q: Will SwitchOver/UX run on the 700's and 300's? A: No, there are no plans at this time. Q: With SwitchOver/UX, is data in memory lost? A: If the secondary takes over, the data in memory is lost. In situations where power is lost to both the primary and the secondary, the battery backup will kick in, and primary memory will not be lost. Q: Does SwitchOver/UX work in a Wide Area Network (WAN)? A: No, it is not with the current product implementation. It is not possible for an X.25 modem to change address analogous to the ethernet address switch technique. Q: Can there be more than one standby spu in a loosely-coupled processor group? A: No, currently only one spu can be configured to support a set of primaries. Q: How much overhead is added due to the state-of-health (heartbeat) daemon? A: The state-of-health daemon uses much less than 1% of the spu's resources and LAN bandwidth. Q: What happens to the processes executing on the standby when automatic recovery starts? A: The default method of handling these processes is to terminate them immediately. This is done in order to speed up the recovery time. It is also possible to gracefully shut down the standby system by modifying the scripts provided with SwitchOver/UX. With these modifications, all processes on the standby can be cleanly stopped before the standby initiates a reboot. Q: Does SwitchOver/UX work with SQL databases? A: SwitchOver/UX is designed to work with industry standard databases like Oracle, Informix, Sybase, Ingres, Allbase, etc. Some databases will perform better than others with SwitchOver/UX because they can be configured to reduce recovery times. Databases which support direct disk access and user definable recovery times work best with SwitchOver/UX. Informix, Oracle and Sybase access the disk directly. Also, some databases have parameters which can be set to limit the time needed to recover the database after an spu reboot. Q: How does this product impact applications? A: In general, applications do not need to be modified to take advantage of SwitchOver/UX. To reduce data loss or speed up recovery times, you may choose to modify your applications. For example, to reduce data loss, the application can be written to prevent the user from entering new transactions until the previous transaction is committed. This technique would elimate data loss and speed up recovery, although it would reduce throughput. Q: In a SwitchOver/UX configuration with disk mirroring on the primary, when the standby takes over, are the disks re-imaged? A: Upon reboot, the disks will be checked to deterime if they were previously "in synch". If they were and there is no file corruption, no re-synching is required and none will be done. Q: When using SwitchOver/UX, can I have something, such as an operator phone call initiated when the primary system has failed? A: Yes. When the primary system has failed, a script file can be invoked on the standby system. This script can be customized to the specific requirements of your environment. Q: When will SwitchOver/UX and DataPair/800 support SCSI? A: SwitchOver/UX will support SCSI disks with HP-UX Release 9.0. This is planned for release mid 1992. HP DataPair/800 will not be enhanced to support SCSI disks. Logical Volume Manager (LVM) will provide SCSI and HP-FL disk mirroring functionality.